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Abstract 

In order to evaluate the quality of the scientific research, we introduce 
a new family of scientific performance measures, called Scientific Research 
Measures (SRM). Our proposal originates from the more recent develop- 
ments in the theory of risk measures and is an attempt to resolve the 
many problems of the existing bibliometric indices. 

The SRM that we introduce are based on the whole scientist's citation 
record and are: coherent, as they share the same structural properties; 
flexible to fit peculiarities of different areas and seniorities; granular, as 
they allow a more precise comparison between scientists, and inclusive, as 
they comprehend several popular indices. 

Another key feature of our SRM is that they are planned to be cal- 
ibrated to the particular scientific community. We also propose a dual 
formulation of this problem and explain its relevance in this context. 

Keywords: Bibliometric Indices, Citations, Risk Measures, Scientific Im- 
pact Measures, Calibration, Duality. 

1 Introduction 

In the recent years the evaluation of the scientist's performance has become 
increasingly important. In fact, most crucial decisions regarding faculty re- 
cruitment, research projects, research time, academic promotion, travel money, 
award of grants depend on great extent upon the scientific merits of the involved 
researchers. 

The scope of the valuation of the scientific research is twofold: 

• Provide an updated picture of the existing research activity, in order to 
allocate financial resources in relation to the scientific quality and scientific 
production; 

• Determine an increase in the quality of the scientific research (of the struc- 
tures) . 
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Even though both aims seems quite obvious, it is worthwhile to emphasize 
that the selection of erroneous valuation criteria (one trivial example would be 
"the number of the publications") could have important negative impact on the 
future research quality. The methodologies for the valuation can be divided into 
two categories: 

• content valuation, based on internal judgments committee and external 
reviews of peer panels. 

• context valuation, based on bibliometrics (i.e. statistics derived from ci- 
tation data) and the characteristics of the Journals associated to the pub- 
lications. 

Economic considerations strongly depone of using the context method on a 
systematic (yearly) base, while peer review is more plausible on a multiple year 
base and should also be finalized to check, harmonize, and tune the outcomes 
based on bibliometric indices. 

Thanks also to the major availability of the online database (i.e. Google 
Scholar, ISI Web of Science, MathSciNet, Scopus) several different bibliometric 
measures have been recently introduced and applied. 

There are several critics, as those clearly underlined by the Citation Statistics 
Report of the International Mathematical Union (2008) [CIT , to the use of 
the citations as a key factor in the assessment of the quality of the research. 
However, many of these critics can be satisfactorily addressed and our proposal 
is one reasonable way to achieve this task. 

We agree that the quality of the scientific research can not be 
reduced to citations, but we also believe that the information em- 
bedded in citations should be properly quantified and should be one 
component of the valuation of the quality of the scientific research. 

We emphasize that the output of the valuation is the classification of authors 
(and structures) into few merit classes of homogeneous research quality: it is not 
intended to provide a fine ranking. In the Appendix we listed a brief summary 
of the pros and cons of bibliometric indices and of the peer review process. 

In 2005 Hirsch |H05| proposed the h-index, that is now the most popular and 
used citation-based metric. The /i-index of an author is defined as the largest 
number h £ N satisfying the condition that h distinct publications of the author 
have (each one) h citations. The /i-index is a vague attempt to measure at the 
same time the production in terms of number of publications and the research 
quality in terms of citations per publication. Our approach aims exactly to take 
better in consideration the balance between these two components. 

After its introduction, the /i-index received wide attention from the scientific 
community and it has been extended by many authors who proposed other 
indices (for an overview see Alonso et al., 2009 [ACHHj ) in order to overcome 
some of the drawbacks of it (see Bornmann and Daniel, 2007 E5D07J). 

In this paper we introduce three novel features in the methodology regarding 
the measurement of the quality of scientific research: 
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1. The coherency of the research measures 

2. The calibration technique 

3. The dual setting. 

1. Differently from any existing approach, our formulation is clearly germi- 
nated from the Theory of Risk Measures. The axiomatic approach developed 
in the seminal paper by Artzner et al.(ADEH99) turned out to be, in this last 
decade, very influential for the theory of risk measures: instead of focusing on 
some particular measurement of the risk carried by financial positions (the vari- 
ance, the V@R, etc. etc.), [ADEH99 proposed a class of measures satisfying 
some reasonable properties (the "coherent" axioms). Ideally, each institution 
could select its own risk measure, provided it obeyed the structural coherent 
properties. This approach added flexibility in the selection of the risk measure 
and, at the same time, established a unified framework. 

We propose the same approach in order to determine a good class of scientific 
performance measures, that we call Scientific Research Measures (SRM). 

The theory of coherent risk measures was later extended to the class of con- 
vex risk measures (Follmer and Schied [FS02 , Frittelli and Rosazza |FR02j ). 
The origin of our proposal can be traced in the more recent development of 
this theory, leading to the notion of quasi-convex risk measures introduced by 
Cerreia-Vioglio et al. [CMMM] and further developed in the dynamic framework 
by Frittelli and Maggis FM11 . Additional papers in this area include: Cherny 
and Madan |CM09j , that introduced the concept of an Acceptability Index hav- 
ing the property of quasi-concavity; Drapeau and Kupper D K10) . where the 
correspondence between a quasi-convex risk measure and the associated family 
of acceptance sets - already present in |CM09) - is fully analyzed. 

The representation of quasi-convex monotone maps in terms of family of 
acceptance sets, as well as their dual formulations, are the key mathematical 
tools underlying our definition of SRM. 

2. A second feature of our approach is that our SRM are planned to be 
"calibrated from the market data", a typical feature of modeling in finance. As 
explained in Section 4, we calibrate the SRM from the historic data that are 
available for one particular scientific area and seniority. In this way, each SRM 
will fit appropriately the characteristics of the research field and seniority under 
consideration. 

3. Our third innovation in this area, is provided by the dual approach to the 
valuation of the quality of the scientific research. As explained in Section 3.2, 
we establish a duality between the primal space, the space of random variables 
(representing the citations records) defined on the set of Journals and its dual 
space, the space of the " Arrow-Debreu price" of each Journal, which could be 
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given by the impact factor of the Journal. In section 3, we discuss this duality 
and show that our SRM fits very well in this framework. 

We finally report some empirical results obtained by calibrating the perfor- 
mance curves to a specific data set. 

To summarize, we propose a family of SRMs that are: 

• coherent, as they share the same structural properties - based on an ax- 
iomatic approach; 

• calibrated to the particular scientific community; 

• flexible in order to fit peculiarities of different areas and ages; 

• robust, as they can be defined, via duality, through a set of probabilities 
representing the "value" of each Journal; 

• granular, as they allow a more precise comparison between scientists; 

• inclusive, as they comprehends several popular indices. 

2 On a class of Scientific Research Measure 

We represent each author by a vector X of citations, where the i-th component of 
X represents the number of citations of the i-th publication and the components 
of X are ranked in decreasing order. We consider the whole citation curve of 
an author as a decreasing bounded step functions X (see Fig[T]) in the convex 
cone: 




X : M — > R + | X is bounded, with only a finite numbers of values, 
decreasing on R + and such that X(x) = for x < 0. 
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Figure 1: Author's Citation Curve 
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We compare the citation curve X of an author with a theoretical citation 
curve f q representing the desiderata citations at a fix performance level q. For 
this purpose we introduce the following class of curves. Let X C R be the index 
set of the ■performance level. For any gglwe define the theoretical performance 
curve of level q as a function f q : R — > R + that associates to each publication 
ieR the corresponding number of citations f q (x) G R+. 

Definition 1 (Performance curves) Given a index set X C R 0/ perfor- 
mance levels q <E X, a class ¥ := {/g} gg2; 0/ functions f q : R — > R+ is a 
family of performance curves i/ 

z,) {/ g } is increasing in q, i.e. if q > p then f q (x) > f p {x) for all x; 

ii) for each q, f q (x) is left continuous in x; 

Hi) f q {x) = for all x < and all q. 

The main feature of these curves is that a higher performance level implies 
a higher number of citations. This family of curves is crucial for our objective 
to build a SRM able to comprehend many of the popular indices and to be 
calibrated to the scientific area and the seniority of the authors. 

Definition 2 (Performance sets and SRM) Given a family of performance 
curves ¥ ={f q } q , we define the family of performance sets Aw '■= {A q } q by 

Aq := {X e X + I X(x) > f q (x) for all 

The Scientific Research Measure (SRM) is the map 4>v '■ X + — > R associated to 
¥ and Aw given by 

(f> ¥ {X) ■ =snp{qeX\X eAg} 

= sup {qeX\ X(x) > f q {x) for all ieR}. (1) 




I 2 3 4 5 6 ? 8 S 10 11 12 
Publications 



Figure 2: Determination of a particular SRM. ft,-index equal to 4. 

The SRM <fi ¥ is obtained by the comparison between the real citation curve 
of an author X (the red line in Fig{2]) and the family F of performance curves 
(the blue line in Fig[2]): the cf> e (X) is the greatest level q of the performance 
curve f q below the author's citation curve X. 
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2.1 Some examples of existing SRMs 



The previous definition points out the importance of the family of theoretical 
performance curves for the determination of the SRM. It is clear that different 
choices of F := {f q } q lead to different SRM F . The following examples show 
that some well known indices of scientific performance are particular cases of 
our SRM. In the following examples, if X has p > 1 publications that received 
at least one citation, we set: X — Yn=i x il(i-i,i\ > with £j > ^i+i > 1 f° r all z, 
and p satisfies X = Xl(o )P i. 

Example 3 (max # of citations) The maximum number of citations of the 
most cited author's paper is the SRM <f> ¥c defined by |Ip, where the perfor- 
mance curves are: f q = gl(o,i]- 



Example 4 (total number of publications) The total number of publica- 
tions with at least one citation is the SRM Fp defined by where the perfor- 
mance curves are: f q = l(o,nl- 
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Example 5 (h-index) The h-index defined by Hirsch fH05\/ may be rewritten 
in our setting. Indeed, the h-index is the SRM cj) ¥h defined by where the 
performance curves are: f q = gl(o, g ]. 



n 
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Example 6 (h 2 -index) According to Kosmulski, 2006 [K0$jj a scientist has 
h 2 -index q if q of his n papers have at least q 2 citations each and the other n—q 
papers have fewer than q 2 citations each. This index is the SRM 4>w h2 defined by 
I^J), where the performance curves are: f q = q 2 l(o, q ]. 

Example 7 (h Q -index) Eck and Waltman, 2008 \EW06f proposed the h a - 
index as a generalization of the h-index defined as: "a scientist has h a -index 
h Q if h Q of his n papers have at least a-h a citations each and the other n— h Q 
papers have fewer than a-h a citations each". Hence, the h a -index is the SRM 
(j) ¥h defined by where the performance curves are: f q = aql^ q ^, a € (0, oo). 

Example 8 (w-index) Woeginger, 2008 [WO 3 08] introduced the w-index de- 
fined as: "a w-index of at least k means that there are k distinct publications 
that have at least 1, 2, 3, 4,..., k citations, respectively". It is the SRM <j> v 
defined by Q), where the performance curves are: f q {x) = (—x + q + 1)1( , 9 ]- 

Example 9 (h rat -index &; h, r -index) The rational and the real h-index, h rat - 
index and h r -index, introduced respectively by Ruane and Tol, 2008 ]RT08^ and 
Guns and Rousseau, 2009 \GR09^ are SRMs, indeed they could be defined as the 
h-index but taking respectively q £ T CQ and q £ X CM. 

2.2 Key properties of the SRM 

Proposition 10 Let ¥ be a family of performance curves, Aw = {A q } q be 
the associated family of performance sets and be the associated SRM. Let 
X 1 ,X 2 eX+. Then: 
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1. i) {A q } is decreasing monotone: A q C A p for any level q > p; 

ii) for any q, A q is monotone: X\ G A q and X 2 > X\ implies X 2 € A q ; 

iii) /or any q, A q is convex: if X\,X 2 € .4 9 i/iera XXx + (1 — A)X 2 € Aj, 

Ae[o,i]. 

i) is monotone increasing: if Xi < X 2 =>■ </> F (Xi) < (^(A^); 

ii) </> F is quasi-concave: <fi r (XXi + (1 — X)X 2 ) > min(0 F (X 1 ), </> F (X 2 )), 

Ae[o,i]. 

Proof. 

1) The proof of the monotonicity and convexity of Aw follows from the defini- 
tion. 

2.i) If X x < X 2 , then X x > f q implies X 2 > f q . Hence {q e 1 \ X 1 > f q } C 

{«eJ|X 2 >/J. 

2.ii) Let </> F (Xi) > m and F (X 2 ) > m. By definition of F , Ve > 3g» s.t. 
Xi > f qi and q t > (j) v {Xi) - e>m-e. Then X, > f q . > f m _ E , as {f q } q 
is an increasing family, and therefore XXx + (1 — A)X 2 > / TO - £ . As this 
holds for any s > 0, we conclude that 4> v (XXi + (1 — A)X 2 ) > m and </> F 
is quasi-concave. 



It is obviously reasonable that a SRM should be increasing: if the citations 
of a researcher X 2 dominate the citations of another researcher X 1 , publication 
by publication, then X 2 has a performance greater than X x . 

Example 11 We show that a SRM is not in general quasi-convex, that is 
(j) r (XXi + (1 — X)X 2 ) < max(0 F (Xi), cf) ¥ {X 2 )) for all X e [0,1]. Consider two 
vectors, X x = [8 6 4 2] and X 2 = [4 2 2 2 2], w/iere X 2 has more 
publications than X\ but less cited. Computing, for example, the w-index we 
obtain that (f) v {X±) = 4 and <fi ¥ ^(X 2 ) = 3, while for the convex combination 
X = \X\ + \X 2 = [6 4 3 2 W 1] we have: 4> Fw (X) = 5. 

2.3 Additional properties of SRMs 

We have seen that all the SRMs </> F share the same structural properties of 
monotonicity and quasiconcavity. Of course other relevant properties of </> F could 
be considered, which could also be built in from the corresponding features of 
the family of performance curves. In this section we show that this is the case 
for the behavior of with respect to the addition of citations (C-additivity) to 
existing papers. 

Definition 12 A SRM <j> ¥ : X+ -> R is: 
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a) C-superadditive if 4>-${X + m) > (p ¥ (X) + to for all to £ R+; 

b) C-subadditive if 4>^(X + to) < cf> ¥ (X) + m for all m £ R + ; 

c) C-additive if (j>$(X + to) = <f> ¥ (X) + to for all to £ R+. 
Definition 13 A a family F = {f q } q of performance curves is: 

a) slowly increasing in q if f q + m — f q < m for all to £ R+; 

b) fast increasing in q if f q + m — f q > m for all m £ R + ; 

c) linear increasing in q if f q + m — f q = m for all m £ K+ . 

These properties of the family of performance curves can be express in terms 
of corresponding properties of the family Aw of performance sets. 

Lemma 14 The family F of performance curve is slowly (resp. fast, linear) 
increasing in q, if and only if 

A q +mC A q+m (resp. A q+m C A q + m, A q+m = A q + m) (2) 

for all m £ R+ and gel. 

Proof. In order to show that A q + m C A q + m we observe that: 

Aq+m '■= {X | X > fq+m} 

A q + m = {X + m\ X > f q } 
= {X\X>f q + m}. 

From f q + m> f q +„n we deduce that X > f q + m implies X > f q + m - Hence 

X£A q +m=^X£ Aq+m. 

Regarding the other implication, we know that if X £ A q + m then X £ 
A q + m , that is X > f q +m implies X > f q + m . This implies that f q + m> f q + m . 
Similarly for the other cases. ■ 

Lemma 15 // a family F of performance curves is slowly (resp. fast, linear) 
increasing in q, then <p r is C-superadditive (resp. C- subadditive, C-additive). 

Proof. In order to show that <fi ¥ (X + m) — m > <fi r (X) for all m £ K + we 
use the definition in ([I]) and we observe that 

4>^{X + to) — to = sup {q | X + m > fq} — m 
— sup {q — to | X > f q — to} 
= sup {q | X > f q+m - m} . 

Hence it's sufficient to show that {q \ X > f q } C {q \ X > f q + m — m} and this 
is true since f q > f q + m — m. Similarly for the other cases. ■ 

As shown in the following examples, the reverse implication in the above 
Lemma does not hold true. 
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Example 16 



The h-index in the example ^ is a C- subadditive SRM, but the associated 



family F of performance curves defined in (2.1) is not fast increasing in q, 
nor slowly increasing. Indeed the family is linear increasing on the Hirsch 
core [0, h], but not outside it. 

• The same considerations hold for the h 2 - and h a - index (see examples |#|) 
and ^fy). 

• The family F defined in Example [#J associated to the w-index, is slowly 
increasing in q. This condition is sufficient to say that the w-index is a 
G- super additive SRM. 

• The maximum number of citations of an article (see example^ is a C- 
additive SRM, even if the family F of performance curves defined in [5] is 
not linear increasing in q. This property holds only on [0,1], since the 
performance curves are equal to zero outside. 

• The total number of publications (see example^ is a G- super additive SRM 
since the family F of performance curves defined in \2.1\ is slowly increasing 
in q. 

A further property concerns the addition of a single publication to the au- 
thor's citation record. 



Definition 17 Let p be the maximum number of publications with at least one 
citation, so that p satisfies: X = Xl(o lP i. A SRM <f) ¥ : X + — > K is 

a) ¥- super additive if <p ¥ (X + l{ p+1 }) > (f> ¥ (X) + 1; 

b) P -subadditive if <fi ¥ (X + l/ p+ n) < <f>g(X) + 1; 

c) P-additive if <f) ¥ (X + l{ p+ i}) = <pp{X) + I; 
c) P-invariance if 4> ¥ {X + l{ p+1 j) = (j> ¥ (X). 

A SRM is P-superadditive if the addition of a new publication with one 
citation leads to an increase of the measure more than linear. Many known 
SRMs are P-invariance (i.e. the c max , h-, h 2 - and /i Q -index in the examples 
([3| ^ and Q) as the addition of one citation to a new publication leaves 
the SRM invariant. The w-index (in the example Q) is P-subadditive as the 
addition of one citation to a new publication makes it greater at most of 1 
unit. While the total number of publications p with at least one citation (in the 
example Q) is clearly P-additive. 
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3 On the Dual Representation of the SRM 



The goal of this section is to provide a dual representation of the SRM. To this 
scope, we need some topological structure. Let (M, B(M), fi) be a probability 
space, where B(M.) is the cr-algebra of the Borel sets, /i is a probability measure 
on 6(R). Since the citation curve of an author X is a bounded function, it 
appears natural to take X 6 L°°(E., B(R), /i), where L°°(R, 6(R), /i) is the space 
of S(M)-measurable functions that are /i almost surely bounded. If we endow L°° 
with the weak topology a(L°°, L 1 ) then L 1 — (L°°, a(L°° , L 1 ))' is its topological 
dual. In the dual pairing (L°°, L 1 , (•, •)) the bilinear form (•, •) : L°° x L 1 — > K 
is given by (X, Z) = E[ZX], the linear function X ^ E[ZX], with Z e L 1 , is 
<j(L°° , L 1 ) continuous and (L°° , a(L°° , L 1 )) is a locally convex topological vector 
space. In this framework, each element of a performance family F = {f q } q is a 
Z?(]R)-measurablc function, the inequalities between random variables are meant 
to hold ^-a.s., and we set: 

A q : ={X€L°°\X>f q }, 
MX) : =suv{qel\ X e A q } . (3) 

We have seen in the Section 1 that the SRM is a quasi-concave and monotone 
map. Under appropriate continuity assumptions, the dual representation of 
these type of maps can be found in |P V90j . [Vo98] . |CMMM| . 

Definition 18 A map <f> ■ L co (M) — > K is a(L°° , L 1 )—upper-semicontinuous if 
the upper level sets 

{X e | (j>{X) > q} 

are a(L°°, L 1 ) — closed for all g£R. 

Lemma 19 If Aw = {A q } q is a family of performance sets thenA q is a{L°° , L 1 )- 
closed for any q. 

Proof. To prove that A q is cr(i°°, L 1 )-closed let Y a £ A q be a net satisfying 

Y a a{ - L _± L ^ y g L°°. By contradiction, suppose that p.{B) > where B := 
{Y < fq] E B(R). Taking as a continuous linear functional Z = 1b <E L 1 , from 

Y a a{L ^4 Ll) Y we deduce: E[l B f q ] < E[l B Y a ] E[1 B Y] < E[l B f q \. m 

The following proposition shows the relation between the continuity property 
of the family F of performance curves, those of the family Aw of performance 
sets and those of the SRM <f> ¥ . 

Proposition 20 Let¥ be a family of performance curves. If¥ is left continu- 
ous in q, that is 

fq-e(x) t fq(x) for £ 1 0, [L - a.S, 

then: 
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1. Aw is left- continuous in q, that is 



A q — ^\A q 



e>0 

2. 

A q = {X e L°° | (f> ¥ (X) > q} , for all qEl. (4) 
3. 4> ¥ is a(L°° , L l ) -upper- semicontinuous. 
Proof. . 

1. By assumption we have that f q _ s (x) t fq(x) f° r £ — > 0, /i-a.s.. We have 



proved in Proposition (10) that {A q } is decreasing monotone, hence we 



know that A q C f] A q ~ e . We need to prove that |~| A q - £ C A q . By 

contradiction we suppose that there exist X G L°° such that X > f q - s 
for every e > but X{x) < f q (x) on a set of positive measure fi. Then 
there exist a 6 > such that f q {x) > X(x) + S on a measurable set B 
with 6 := n(B) € (0, 1]. Since f q ^ e f /q we may find e > such that 
f q -e(x) > f q (x) — | on a measurable set C with /it(C) > 1 — b. Thus 
n C) > and > / g _g(a;) > - | > + | on B n C 

2. Now let 

B q := {I £ I°° F (X) > . 

^4 g C _B g follows directly from the definition of tfi ¥ . We have to show that 
B q Q A q . Let X E B q . Hence (j>f(X) > q and, for all e > 0, there exists 
g such that q > <j) v {X) — e > q — e and X > f q . Since {f q } q is increasing 
in q we have that X > f q - e for all e > 0, therefore X G By 
item 1 and the left continuity in q of the family F we know that {A q } is 
left-continuous in q and so: f] A q - e = A q . 

e>0 



3. By Lemma (19) we know that A q is cr(L°°, L 1 )— closed for any q and 
therefore the upper level sets B q = A q are a(L°°, L 1 )— closed and <j> ¥ is 
^(L 00 ,^ 1 ) upper semicontinuous. 

122} It can be proved 



The next lemma will be applied in the proof of theorem 
in a way similar to the convex case (see for example [FS04 



Lemma 21 Let <j) ¥ : L°° — > R be a SRM. Then the following are equivalent: 
cj) ¥ is a(L°° , L 1 ) -upper semicontinuous; 

4> ¥ is continuous from above: X n ,X G L°° and X n ! X imply (f> v (X„) 1 (j) ¥ (X) 

Proof. Let <f> ¥ be cr(L°°, L 1 )-upper semicontinuous and suppose that X n 4- 
X. The monotonicity of <j) ¥ implies 4> ¥ {X n ) > 4> ¥ (X) and <fi ¥ (X n ) 1 and therefore 
q := lim„0 F (X„) > </> F (X). Hence M x n) > q and X n G B g := {7 £ L°° 
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<PriX) > <z} which is <j(L°° , L 1 )-closed by assumption. As the elements in L 1 are 

order continuous, from X n jlwe get A„ " L - — V L ^ A and therefore X e B q . 
This implies that </> F (A) = q and that ^ F is continuous from above. 

Conversely, suppose that 4>v is continuous from above. We have to show 
that the convex set B q is a(L°° , L 1 )-closed for any q. By the Krein Smulian 
Theorem it is sufficient to prove that C := B q n {X e L°° \ \\ X || oc < r} is 
a(L°° , L 1 )-closed for any fixed r > 0. As C C L°° C L 1 and as the embedding 

(L°°, <7(L°°, L 1 )) =-> ( j L 1 ,ct(L 1 ,L 00 )) 

is continuous it is sufficient to show that C is erfX 1 , L°°)-closed. Since the 
cr(L x , L°°) topology and the L 1 norm topology are compatible, and C is convex, 
it is sufficient to prove that C is closed in L 1 . Take X n £ C such that X n —> X 
in L l . Then there exists a subsequence {3^,}„ C {A„} n such that Y n — s- A 
a.s. and (^(3^) > <? for all n. Set Z m := sup„ >m Y n V A. Then Z m e 
since {^} n is uniformly bounded, and Z m > Y m , <p ¥ (Z m ) > <p ¥ (Y m ) and Z m J, 
A. From the continuity from above we conclude: (f> ¥ (X) = lim m </> F (Z m ) > 
limsup m (j) ¥ (Y m ) > q. Thus A G i? 9 and consequently A € C. ■ 



When the family of performance curves F is left continuous, Proposition (20 1 
shows that the SRM is er(L°°, L 1 )-upper semicontinuous. Hence we can provide 
a dual representation for the SRM in the same spirit of |Vo98j . [CMMMj and 
|DK10| . In the following theorem we first provide the representation of </> F in 
terms of the dual function H defined in ^ and then we show that </> F can also be 
represented in terms of the right continuous version of H, which can be written 
in a different way as in ([7]). This dual representation will provide an interesting 
interpretation of the SRM (see section 3.2). 

Denote 

V := {Q < P] and Z := \z = ^ | Q e vX = {Z e L\ \ E[Z] = l} . 

Theorem 22 Suppose that the family of performance curves F is left contin- 
uous. Each SRM F : L°°(K,B(R),/i) -> R defined in ||) can be represented 
as 

F (A) = M z H(Z,E[ZX})= mf z H+(Z,E[ZX}) (5) 

= inf H + (Q, En\X}) for all X G L°° 
Qev w 

where H : L 1 x R — >• R is defined by 

H(Z,t) := sup {MO I E[Zt] <*}= (6) 

H + (Z, •) is ifs rzg/ii continuous version: 

H + (Z,t) : =m£H(Z,s) 



s>t 

= sup{ (Z eR|t> 7 (Z,g)}, (7) 
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and 7 : L 1 x K — > K is defined by: 

j(Z, q) := inf {£[ZA] | F (X) > q} . (8) 

Proof. Step 1: <^.(X) = mi Ze z H(Z, E[ZX]). 

Fix X £ L°°. As X G {£ G L°° | < £[ZA]}, by the definition of 

H(Z,E[ZX}) we deduce that, for all Z £ L 1 , 

H(Z,E[ZX}) > MX) 

hence 

M li H(Z,E[ZX])>MX). (9) 
We prove the opposite inequality. Let e > and define the set 

a:={f6i 0O |W0>^W+e} 



As F i s quasi-concave and a(L°° , L 1 )-upper semicontinuous (Propositions 10 



and 20), C is convex and cr(L°°, L 1 )— closed. Since X ^ C e , (if (fe(X) = — oo, 
we may take Cm := {£ G I <^f(£) > and the following argument would 

hold as well) the Hahn Banach theorem implies the existence of a continuous 
linear functional that strongly separates X and C s , that is there exist Z £ e L 1 
such that 

E[Z e i\ > E[Z £ X] for all £ G C e . (10) 

Hence 

U g L°° | < E[Z E X}} C C £ c := {£ G L°° | F (£) < M*) + e} 

and from ^ 

MX) < inf H{Z,E[ZX\) < H(Z e ,E[Z e X}) 
zeL 1 

= sap{M0 I £ € L°° and £[Z £ £] < £[Z £ A]} 

< sup{^(£) | £ G L°° and MO < <M*) + e} < M*) + * 

Therefore, </> F (A) = inf^eL 1 H(Z,E[ZX]). To show that the inf can be taken 
over the positive cone L\, it is sufficient to prove that Z e C Zq_. Let Y G L^J° 
and £ G C e . Given that </) F is monotone increasing, £ + nY~ G C e for every n 6 N 
and, from (10 1, we have: 

E[ZJ£ + nY)] > E\Z e X] =>> E\Z £ Y] > E ^ X ~ ^ q asn^oo. 

n 

As this holds for any Y G LS? we deduce that Z e C i 1 ^. Therefore, 0jr(X) = 
inf Z€L i + H(Z,E[ZX]). 

By definition of H(Z,t), 

H(Z, E[ZX}) = H{XZ, E[{XZ)X}) VZ e h\ , Z ^ 0, A G (0, oo). 
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Hence we deduce 

gVPO = inf H(Z,E\ZX}) = inf H(Z,E\ZX]) = inf H(Q,E \X] 
zci.im zgz ogv v 



Step 2: ^(X) = mf ZeZ H+(Z,E[ZX}). 

Since H (Z, •) is increasing and Z G L\ we obtain 

E\ZX\) := inf a) < lim H(Z, E[ZX m ]), 

s> hi\Zi A J A m ,LA 



f (X) = inf < inf H+(Z,E[ZX]) < inf lim #(Z, £[ZIJ) 

ZeL]_ Z<EL]_ Z£L\ X m lX 

= lim inf H{Z,E[ZX m ]) = lim 6 r {X m ) = MX), 



where in the last equality we applied Lemma 21 that guarantees the continuity 
from above of <p ¥ . 

Step 3: H+{Z,t) := inf s>t if (Z, s) = sup{g G R | 7(Z,g) < t} where 7 is 
defined in 
Denote 

S(Z, t) := sup {q G R | 7(Z, g)<i}, (Z, i) G L 1 x R, 

and note that 5*(Z, •) is the right inverse of the increasing function j(Z, •) and 
therefore S(Z, ■) is right continuous. 

To prove that H + (Z,t) < S(Z,t) it is sufficient to show that for all p > t we 
have: 

H(Z,p) < S(Z,p), (11) 



Indeed, if (11 1 is true 



H + (Z,t) = m£H(Z,p) < m£S(Z,p) = S(Z,t), 
P>t p>t 

as both ff + and S are right continuous in the second argument. 
Writing explicitly the inequality ( |TT| ) 

sup {<M£) I E[Z£] <p}< sup {qeR\ 7 (Z, g) < p} 

and letting £ G L°° satisfying < p, we see that it is sufficient to show the 

existence of q G R such that "f(Z,q) < p and q > </>p(£)- If </>f(£) = 00 then 
j(Z,q) < p for any q and therefore S(Z,p) = H(Z,p) — 00. 

Suppose now that 00 > F (£) > -co and define q := <j>f(£). As f?[£Z] < p 
we have: 

7(^,5) :=inf{E[Z€]|MO>«}<P- 
Then q G R satisfies the required conditions. 
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To obtain H + (Z,t) := inf p>t H(Z,p) > S(Z,t) it is sufficient to prove that, 
for all p > t, H{Z,p) > S(Z,t), that is : 

sup {<M£) | E[Z£] < P } > sup {q G R | 7 (Z, g) < t} . (12) 

Fix any p > t and consider any q G R such that 7(Z, g) < i. By the definition 
of 7, for all e > there exists £ £ G L°° such that <?V(£ E ) > 9 and E[Z£ e ] < t + e. 



Take er such that < £ < p - t. Then < p and </> F (U > <? and (|T2| 

follows. ■ 

Remark 23 (Interpretation of formulas [7] and [8]) LetQ be the 'weight' that 
we can assign to the author's publications (for example, the impact factor of the 
Journal where the article is published). For a fixed Q, the term "f(Q,q) '■— 
mi {Eq[£\ | </>{■(£) > q} represents the smallest Q-average of citations that a 
generic author needs in order to have the SRM at least of q. We observe that 
this term is independent from the citations of the author X. 

On the light of these considerations we can interpret the term H + (Q, Eq[X]) := 
sup {q € R | Eq[X] > j(Q, q)} as the greatest performance level that the author 
X can reach, in the case that we attribute the weight Q to the publications. 
Namely, we compare the Q-average of the author X, Eq[X], with the minimum 
Q-average necessary to reach each level q, that is j(Q,q). 

3.1 Examples 

In the following examples we find the dual representation of some existing in- 
dices. In all these examples the family F of performance curves is left continuous 



hence, by Proposition (20 1, the associated SRM (j> v is a(L°° , L 1 )-upper semicon- 
tinuous and, from Q, X satisfies: 

^{X) > q iS X e A q iSX>f q . 

Therefore, we find the dual representation computing 7, H + and 4> F applying 
the formulas: and dHl. Let X G L+, Z G L\, q G R+. Then: 



j(Z,q) := inf E[ZX] = inf E\ZX\ = E\Zf q \. 

<p r (X)>q X>f g 

Recall that X = Y^i=i x i^(i-i,i] ! with xi > Xi+\ > for all i, and that p 
satisfies X = Xl( p ] G . 

Example 24 (max # of citations) Consider the example Q), where f q — 
?l(0,i]- Then: 

l{Z,q) = qE[Zl {0tl] ] 

and we obtain 

E[ZX] 



H + (Z,E[ZX}) :=sup{ 9 e K | E[ZX] > qE[Zl {0il] }} = 



1G 



(we may assume that E[Z1^ Q1 ^\ ^ 0, otherwise H + (Z,E[ZX]) = +00 and it 
does not contribute to 4> ¥ ). In our application, any non zero citation vector 
X always satisfies X > Xil(o.i] and, since E[Xl( i}] = XiE[l(Q t -n], we also 

have: Epfc] > There f° re < 



E 



L (o,i] 



-^[1(0,1 



< E 



Z 



X 



1(0,1] 



MZ e L\{1 



and 



Hence: 



E[ZX] > Ell^X] yZeLl+{R) , 



E Zl 



(0,1]J 



E[l 



(o,i]J 



(X)= inf H+(Z,E[ZX]) = inf E \ zx \ 

Z£L\(R) ZeL\(W) E[Z1( 0A] ] 



E[l(o,i]X] 
E [l(o,i] l(o,i] 



Xi, 



i.e. the infimum is attained at Z = l(o,i] € which is of course natural as 
this SRM weights only the first publication. 



Example 25 (total # of publications) Consider the example Q), where f q 

1 (Z,q)=E[Zl (0>q] ] 



l(o, g i- Then 



and we obtain 

H+(Z,E[ZX}) := sup{q G K | E[ZX] > E[Zl {0 ^ q] }} . 

Hence the dual representation of the total number of publications with at least 
one citation is 



inf 



zeL\ 



sup q 

E[ZX]>E[Zl l0:q] ] 



We show indeed that 4> ¥p (X) = p, where p is characterized by X — Xl^ p ^ . For 
all Z £ L\, and q < p we have 



and therefore 



E[ZX] = E[ZXl(p tP] ] > E[l m Z\ 
sup q > p yZ e L\ , 

E[ZX]>E[Zl i0 , q] ] 



and Fp (X) > p. Regarding the < inequality, it is enough to take Z = l( p p+( 5], 
with S > 0. In this case, the condition E[ZX] > E[Zl^ q ^\ becomes 



- E[ i -( P , P +s]X] > S[l( P)P+5 ]l( 0i g]] 
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that holds only for q < p, hence 

H + (Z : E[ZX]) = sup q = p 

E [ 1 (p,P + 6] X ]> E [ l (p,P + S] 1 (0,q]] 

and (j) ¥p (X) < p. 

Example 26 (h-index) Consider the example |5|), where f q = ql(o, q ]- Then 

l(Z,q) = E[Zql( 0tq] ] 

and we obtain 

H+(Z,E[ZX}) := supjq G R | E[ZX] > E[Zql^ q] ]} . 
Hence the dual representation of the h-index is 

<h h i x ) = inf SU P 9 

ZeL\(Wl+) E[ZX]>E[Z q l {0tq] ] 

We indeed show that 4> ¥h (X) = h, where h is characterized by Xl( ,/i] > hl(o,h] 
and Xl(ft, ;+0o j < /il(/j,,+oo)- First we check that <p w (X) > h. For all Z G L\, 
and q < h we have 

E[ZX] > E[ZXl (QM ] > E[Zhl (0M ] > E[Zql {0 J, 



sup q> h \/Z 6 h\ 

E[ZX]>E[Zql {CKq] ] 

and (j) ¥h (X) > h. 

Regarding the < side, take Z = l(h_h+8] with S > 0. For all q > h there 
exists 5 > such that h + S < q and then 



E[l(h,h+8]X] < E[l^ hth+S ^h] < £ , [l(h,/i+5]5'l(0 



sup q < h 

E[l{h,h + 5]X]>E[li^ h j l + 5 ] ql(0,q]] 



and 4> ¥h (X) < h. 



3.2 On the dual approach to SRM 



The dual representation in Theorem [22] and the Remark 23 suggest us another 
approach for the definition of a class of SRMs. 

In other words, which is the interpretation of the duality that we are discov- 



ering ? 



The primal space is given by the set of all the possible author's citation 
records, i.e. by all the random variables X(w) defined on the events w € 0, 
where each event now corresponds to the journal in which the paper 
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The dual space is then represented by all possible linear valuation (the 
"Arrow-Debreu price") of the journals. 

We may fix a plausible family of probabilities V C {Q -C P} where each 
Q(w) then represents the 'value' attributed to the journal w € fl. The valuation 
criterion for journals (i.e. the selection of the family V) has to be determined a 
priori and could be based on the 'impact factor' or other criterion. A specific 
Q could attribute more importance to the journals with a large number of 
citations (a large impact factor); another particular Q to the journals having a 
"high quality" . A priori there will be no consensus on the selection of the family 
V, hence a robust approach is needed. 

As suggested from the dual representation results and in particular from the 
equations ^ and Q we consider, independently to the particular scientist X , 
a family {lp}p eR of functions 7^ : V — > K that associate to each Q the value 
7a (Q), that represents the smallest Q-average of citations in order to reach a 
quality index at least of j3. 

So given a particular value Q(wi) for each z t?l -journal and the average cita- 
tions jg(Q) necessary to have an index level greater than (3, we build the SRM 
in the following way. We define the function H + : V x R —> M. that associates 
to each pair (Q, Eq(X)) the number 

H+(Q,E Q (X)) := sup {/? e K | E Q (X) > lp {Q)} , 

which represents the greatest quality index that the author X can reach when 
Q is fixed, and we build the SRM as follows: 

<I>(X) :=Wff+(ftE Q (I)) 

which represents a prudential and robust approach with respect to V, the plau- 
sible different selections of the evaluation of the Journals. This SRM is by con- 



struction quasi-concave and monotone increasing. Theorem 22 exhibits the 



relationship between the performance curve approach and this dual approach. 



4 Example of the calibration of a SRM 

Since the SRM introduced in Section 2 depends on the particular family F 
of performance curves, in this section we provide an example showing how to 
calibrate the family F from the historic data available for one particular scientific 
area and seniority. In this way, the SRM will fit appropriately the characteristics 
of the research field and seniority under consideration. We recall that the SRM 
should be used only in relative terms (to compare the author quality with respect 
to the other researchers in the same area) in order to classify the authors (and 
structures) into few classes of homogeneous research quality. 
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4.1 Determination of the family {f q } q and of the SRM 

The first step consists in the selection of a representative sample of M authors in 
the same scientific area and with the same seniority and then from this sample of 
authors we need to extrapolate the family of curves {fq} q that better represents 
the citation curve of the area and seniority. The analysis of the citation vectors 
of each author (see Figj3| shows that the theoretical model may be described 
(for this particular scientific area) by the formula 

/,(*) = 4 (13) 



SCO 




Figure 3: Citation curves of 20 senior authors in Math Finance area. 

with q, B <E R+. Setting \nf q = Y, ln(q) = q, Inx = X, B — B we obtain the 
linearized model 

Y = q-fiX. (14) 

For each i-th author of the sample we determi ne & that minimizes the sum 
of the square distances of the points from the line ( 14 ) . Then, we compute /3 as 
the average of the $f 

1 M 

Once the parameter B is fixed, we obtain the family of performance curves 
f q (x) = and then the associated SRM (hereafter called the <fi- index) is: 

<f>(X) = sup {q e R | X(x) >4= Vx I (15) 
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4.2 The empirical results 



We have chosen a group of 20 well established researchers in the mathematical 
finance area. We have computed the j3 i for each author and we have found that 
/3 = 1,62. 

In the following table (Fig|4]a) we report the results and the respective 
ranking obtained calculating the 0-index as in ( 15 ) and the /i-index for each 
author. Fig|4]b shows that the hyperbole-type curve (red line) corresponding to 
the author's </>-index is always below his citation curve (blue line), in the domain 
(0,P). 



Author 


o-index 


rank o- 
index 


h-index 


rank h- 
index 


Author A 


4423 


1 


53 


2 


Author B 


29S5 




60 


1 


Author D 


1235 




35 


3 


Author E 


1136 


4 


35 


4 


Author F 


950 


5 


25 


14 


Author C 


9CS 


6 


23 


7 


Author R 


875 


7 


23 


8 


AuthorT 


sec 


S 


29 


6 


Author P 


73 C 


9 


23 


9 


Author H 


723 


10 


33 


5 


Author Q 


511 


11 


26 


11 


Author L 


451 


12 


24 


15 


Author Q 


449 


13 


20 


17 


Author M 


417 


14 


27 


10 


Author J 


318 


15 


26 


12 


Author N 


304 


16 


17 


19 


Author 1 


240 


17 


26 


13 


Author 


221 


:s 


15 


20 


Author K 


ISO 


19 


23 


16 


Author S 


127 


20 


18 


18 




Figure 4: (a) left - (b) right 



Notice that the author F increases his index, from the 14th position in the 
h-'mdex to the 5th one of the 0- index. If we compare this author with the author 
J, we note that both have almost the same /i-index but the 0-index of F is much 
greater than the </>-index of /. Analyzing their citation curves we observe that 
they have the same number of publications, but F has in general a lot more 
citations for any publication than /, especially those in the Hirsch-corc. The 
same reasons explain the different ranking of the authors H and D. 

The conclusion is that the calibrated family of performance curves F takes 
in the correct account the balance between the number of publications and 
the citations per publication, a characteristic indeed of the specific area under 
consideration. 
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5 Appendix 



We provide a brief summary of the positive and negative features of the peer 
review process and of the bibliometric indices. We are well aware that these 
remarks are incomplete and represent a subjective, not scientifically based, view 
of this complex and controversial theme. In the references, as outlined in the 
introduction, more detailed arguments can be found. The following remarks 
however shows that many possible drawbacks of bibliomctrics indices may be 
smoothened and reduced by an appropriate use of them and by the selection 
of a more convenient class of indices of the type we presented in the previous 
sections. 

5.1 Summary of the pros and cons of context evaluation 
(bibliometric indices) 

Pros: 

• Easily accessible, from the online databases (Google Scholar, ISI Web, 
MathSciNet, Scopus...); 

• Not expensive: can be used systematically, especially if tested - every n 
years - with peer review. 

• Quick to compute 

• "Objective" , in the reductive meaning of being independent from indi- 
vidual judgements. 

Cons: 

• Subjective interpretation of citations, as it can be more subjective 
than the judgment of experts - see Citation Statistics Report of the Inter- 
national Mathematical Union (2008) [CIT] . 

— The new metric must be validated against other (possibly non metric) 
criterion already validated. 

— It has been pointed out - see the discussion in the American Scien- 
tist Open Access Forum, 2008 [AS OAF] - that citation metrics are 
extremely correlated with peer reviews. 

• Improper comparison of papers belonging to different fields. 

— The SRM should be used to rank each author inside his scientific 
community (e.g.: top 10% - top 30% - average...). It provides relative 
- to fields - values, not absolute values. However, this allows also for 
a coarse comparison of authors belonging to different areas, in the 
sense that it is possible to easily recognize the authors that are in 
the same (top/lower/ ...) merit class in each area. 
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Improper comparison of papers having different ages. 

— Our SRM may be calibrated to different ages (as well as different 
areas). 

Different databases provide different citations. 

— Many areas (naturally) share the same database. 

— The outcome of the scientific measure is in relative terms: the ranking 
of one author is compared with the ranking of all researchers in the 
same area (hence using the same database). 

— Different databases (Google Scholar, MathSciNet,...) provide differ- 
ent numbers (in terms of citation of each paper) , but only via a scaling 
factor: the overall ranking of the papers, with respect to the number 
of citations received, remains essentially the same, see [ASOAFj . 

Co-authors 

— It is possible to normalize the citation numbers per each single author. 
For some fields (where papers have typically many co-authors) this 
may be problematic. 

Incorrect citations attributed to an author and self citations 

— Both problems can be easily addressed by the systematic use of Au- 
thor Codes (a code that identify the author). 

A single number is insufficient for the evaluation of a complex 
feature, such as scientific research. 

— We agree: It is necessary to find multiple metrics (including time- 
based metrics). We propose one of them. 

— This argument should not lead to abandon the search of appropriate 
multiple metrics. 

Quality of the scientific research can not be reduced to citations 

— Agree: it is only one component that however should be properly 
quantified. 

Negative credit: citations may be attributed not as reward citations 
(to give credit to the work of the cited author) but as negative credit (or 
"rhetorical credit" due to the prestige of the cited author). 

— True. Many are the motivations of citations and they varies among 
authors: they do not always reflects reward, but certainly a large 
percentage of citations are credit ones. Indeed: 
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— The fact that citation based statistics often agree with other vali- 
dated form of valuation (peer review), see |ASOAF) . suggests that, 
to some degree, these metrics indeed reflects the impact of the au- 
thor's research. 

— The periodical peer review valuation should point out the macro- 
scopic exceptions to reward citations (papers mostly cited for their 
fallacy) . 

• Disincentive for young researcher to study subjects more innovative but 
less popular 

— True, even though this could be compensated by the consideration 
that innovative paper (in a new field) typically receive many citations. 

• Negative Implications: The use of citation based metrics will increase 
the number of citations (and improper ones). 

— The abuse of citations is comparable with intentional misjudgment 
by referee: unfortunately this is always possible. 

— When citations number are high (in the order of hundreds) it is dif- 
ficult to modify the citation records with self or friendly citations. 

— It is not completely unfair that a strong scientific group (capable 
to produce a large number of published papers) receives additional 
credit (due to potential additional citations from the group). 

5.2 Summary of the pros and cons of content valuation 
(peer review) 

Pros: 

• effective assessment of the quality of the research; 
Cons: 

• expensive, in term of time and people involved: It can not be used sys- 
tematically. 

• subjective, since the result depends on the referees: do they operate 
properly, are they competent and reliable? The choice of the referees is a 
very delicate issue. 

• non-uniformity of the judgment, as each evaluator has a personal 
scaling preferences leading to different ranking (specially in different ar- 
eas). 
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